Improved Search Strategies and Extensions to K-medoids-based Algorithms - Extended Report

نویسندگان

  • Shu-Chuan Chu
  • John F. Roddick
  • Jeng-Shyang Pan
چکیده

In this paper a number of improvements are suggested that can be applied to most k-medoids-based algorithms. These can be divided into two categories conceptual / algorithmic improvements, and implementational improvements. These include the revisiting of the accepted cases for swap comparison and the application of partial distance searching and previous medoid indexing to clustering. We propose extensions to the problem of nearest neighbor search, by combining the previous medoid index with triangular inequality elimination and partial distance searching. An improved k-medoids algorithm using simulated annealing, CLASA, is also discussed, as is a novel mechanism for managing memory usage. Various hybrids of these search approaches are then applied to a number of k-medoids-based algorithms and we show that the method is generally applicable. For example, experimental results based on various datasets, including both artificial and real datasets, demonstrate that when applied to CLARANS the number of distance calculations can be reduced by up to 98% with similar average distance per object. Importantly, these search approaches can also be applied to nearest neighbor searching and other clustering algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

New search strategies and a new derived inequality for efficient k-medoids-based algorithms

In this paper, a new inequality is derived which can be used for the problem of nearest neighbor searching. We also present a searching technique referred to as a previous medoid index to reduce the computation time particularly for the kmedoids-based algorithms. A novel method is also proposed to reduce the computational complexity by the utilization of memory. Four new search strategies for k...

متن کامل

Improved COA with Chaotic Initialization and Intelligent Migration for Data Clustering

A well-known clustering algorithm is K-means. This algorithm, besides advantages such as high speed and ease of employment, suffers from the problem of local optima. In order to overcome this problem, a lot of studies have been done in clustering. This paper presents a hybrid Extended Cuckoo Optimization Algorithm (ECOA) and K-means (K), which is called ECOA-K. The COA algorithm has advantages ...

متن کامل

Parallel Multi-Swarm PSO Based on K-Medoids and Uniform Design

PAM (Partitioning around Medoid) is introduced to divide the swarm into several different subpopulations. PAM is one of k-medoids clustering algorithms based on partitioning methods. It attempts to divide n objects into k partitions. This algorithm overcomes the drawbacks of being sensitive to the initial partitions in kmeans algorithm. In the parallel PSO algorithms, the swarm needs to be divi...

متن کامل

Efficient Web Usage Mining Based on K-Medoids Clustering Technique

Web Usage Mining is the application of data mining techniques to find usage patterns from web log data, so as to grasp required patterns and serve the requirements of Web-based applications. User’s expertise on the internet may be improved by minimizing user’s web access latency. This may be done by predicting the future search page earlier and the same may be prefetched and cached. Therefore, ...

متن کامل

Fast Online k-nn Graph Building

In this paper we propose an online approximate k-nn graph building algorithm, which is able to quickly update a k-nn graph using a flow of data points. One very important step of the algorithm consists in using the current distributed graph to search for the neighbors of a new node. Hence we also propose a distributed partitioning method based on balanced k-medoids clustering, that we use to op...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002